Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto

نویسندگان

  • Robert Baumgartner
  • Sergio Flesca
  • Georg Gottlob
چکیده

Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting information from Web pages using such wrappers, and for translating the extracted content into XML. This paper describes some advanced features of Lixto, such as disjunctive pattern definitions, specialization rules, and Lixto’s capability of collecting and aggregating information from several linked Web pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Report on the Eighth International Workshop on Knowledge Representation Meets Databases ( KRDB ) , September 15 , 2001

The Eighth International Workshop on Knowledge Representation Meets Databases (KRDB) was held at the Ponti cia Universit a Urbaniana, in Rome, right after VLDB 2001. KRDB was initiated in 1994 to provide an opportunity for researchers and practitioners from the two areas to exchange ideas and results. This year's focus was on Modeling, Querying andManaging Semistructured Data. The one day progr...

متن کامل

Visual Web Information Extraction with Lixto

We present new techniques for supervised wrapper generation and automated web information extraction, and a system called Lixto implementing these techniques. Our system can generate wrappers which translate relevant pieces of HTML pages into XML. Lixto, of which a working prototype has been implemented, assists the user to semi-automatically create wrapper programs by providing a fully visual ...

متن کامل

Declarative Web data extraction and annotation

We propose a software architecture for semantics-based annotation of data extracted from Web sources. Starting from the LiXto suite, which enables semi-automated extraction of XML data from regular documents, we present a solution for attaching background information to individual tags by means of so-called decorations. Decoration is carried out as an inferential activity in the formal context ...

متن کامل

Intelligent Wrapping from PDF Documents

Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. The semi-structured form of web pages, coupled with the availability of business-relevant data, has led to the availability of several established products on the market for wrapping data from the Web. One such approach is the Lixto me...

متن کامل

Web Information Acquisition with Lixto Suite: A Demonstration∗

We demonstrate the Lixto Suite, a web data extraction and transformation software kit for retrieving and converting information from various sources to various customer devices. With the Lixto Suite, non-technical content managers can rapidly develop applications in the areas of M-Commerce, E-Commerce, content integration and corporate portals.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001